OFRewind: Enabling Record and Replay Troubleshooting for Networks
نویسندگان
چکیده
Debugging operational networks can be a daunting task, due to their size, distributed state, and the presence of black box components such as commercial routers and switches, which are poorly instrumentable and only coarsely configurable. The debugging tool set available to administrators is limited, and provides only aggregated statistics (SNMP), sampled data (NetFlow/sFlow), or local measurements on single hosts (tcpdump). In this paper, we leverage split forwarding architectures such as OpenFlow to add record and replay debugging capabilities to networks – a powerful, yet currently lacking approach. We present the design of OFRewind, which enables scalable, multi-granularity, temporally consistent recording and coordinated replay in a network, with finegrained, dynamic, centrally orchestrated control over both record and replay. Thus, OFRewind helps operators to reproduce software errors, identify data-path limitations, or locate configuration errors.
منابع مشابه
CERIAS Tech Report 2015-12 Software and Hardware Approaches for Record and Replay of Wireless Sensor Networks
Tan Creti, Matthew Edward Ph.D., Purdue University, August 2015. Software and Hardware Approaches for Record and Replay of Wireless Sensor Networks. Major Professor: Saurabh Bagchi. Wireless Sensor Networks (WSNs) are used in a wide variety of applications including environmental monitoring, electrical grids, and manufacturing plants. WSNs are plagued by the possibility of bugs manifesting only...
متن کاملProcessor-Oblivious Record and Replay
Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-based execution models, and record the behaviors and interleavings of individual threads. Dynamic multithreaded languages and libraries, such a...
متن کاملDCR: Replay-Debugging for the Datacenter
We’ve built a tool for debugging non-deterministic failures in production datacenter applications. Our system, called DCR, is the first to efficiently record and replay large scale, distributed, and data-intensive systems such as HDFS/GFS, HBase/Bigtable, and Hadoop/MapReduce. The enabling idea behind DCR is that debugging doesn’t require a precise replica of the original datacenter run. Instea...
متن کاملRecord/Play in the Presence of Benign Data Races
In this article we present our experience with the integration of record/replay in the Jalape~ no virtual machine. The goal of record/replay is to be able to faithfully replay an application. Previous work in Jalape~ no focused on the replay of Java applications on uni-processors. Here we describe additional work done to obtain replay with low intrusion on multi-processor systems by doing `orde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011